Search CORE

19 research outputs found

SPGP: Structure Prototype Guided Graph Pooling

Author: Kim Sun
Lee Dohoon
Lee Sangseon
Piao Yinhua
Publication venue
Publication date: 16/09/2022
Field of study

While graph neural networks (GNNs) have been successful for node classification tasks and link prediction tasks in graph, learning graph-level representations still remains a challenge. For the graph-level representation, it is important to learn both representation of neighboring nodes, i.e., aggregation, and graph structural information. A number of graph pooling methods have been developed for this goal. However, most of the existing pooling methods utilize k-hop neighborhood without considering explicit structural information in a graph. In this paper, we propose Structure Prototype Guided Pooling (SPGP) that utilizes prior graph structures to overcome the limitation. SPGP formulates graph structures as learnable prototype vectors and computes the affinity between nodes and prototype vectors. This leads to a novel node scoring scheme that prioritizes informative nodes while encapsulating the useful structures of the graph. Our experimental results show that SPGP outperforms state-of-the-art graph pooling methods on graph classification benchmark datasets in both accuracy and scalability.Comment: 18 pages, 6 figure

arXiv.org e-Print Archive

Venn-diaNet : venn diagram based network propagation analysis framework for comparing multiple biological experiments

Author: Hur Benjamin
Kang Dongwon
Kim Sun
Lee Gung
Lee Sangseon
Moon Ji Hwan
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 29/12/2019
Field of study

Background The main research topic in this paper is how to compare multiple biological experiments using transcriptome data, where each experiment is measured and designed to compare control and treated samples. Comparison of multiple biological experiments is usually performed in terms of the number of DEGs in an arbitrary combination of biological experiments. This process is usually facilitated with Venn diagram but there are several issues when Venn diagram is used to compare and analyze multiple experiments in terms of DEGs. First, current Venn diagram tools do not provide systematic analysis to prioritize genes. Because that current tools generally do not fully focus to prioritize genes, genes that are located in the segments in the Venn diagram (especially, intersection) is usually difficult to rank. Second, elucidating the phenotypic difference only with the lists of DEGs and expression values is challenging when the experimental designs have the combination of treatments. Experiment designs that aim to find the synergistic effect of the combination of treatments are very difficult to find without an informative system. Results We introduce Venn-diaNet, a Venn diagram based analysis framework that uses network propagation upon protein-protein interaction network to prioritizes genes from experiments that have multiple DEG lists. We suggest that the two issues can be effectively handled by ranking or prioritizing genes with segments of a Venn diagram. The user can easily compare multiple DEG lists with gene rankings, which is easy to understand and also can be coupled with additional analysis for their purposes. Our system provides a web-based interface to select seed genes in any of areas in a Venn diagram and then perform network propagation analysis to measure the influence of the selected seed genes in terms of ranked list of DEGs. Conclusions We suggest that our system can logically guide to select seed genes without additional prior knowledge that makes us free from the seed selection of network propagation issues. We showed that Venn-diaNet can reproduce the research findings reported in the original papers that have experiments that compare two, three and eight experiments. Venn-diaNet is freely available at: http://biohealth.snu.ac.kr/software/venndianetThis publication has been funded by (i) Next-Generation Information Computing Development Program through the National Research Foundation of Korea (NRF) the Ministry of Science ICT (MSIT) (No.NRF-2017M3C4A7065887), (ii) The Collaborative Genome Program for Fostering New Post-Genome Industry of the National Research Foundation (NRF), the Ministry of Science and ICT (MSIT) (No.NRF2014M3C9A3063541), and (iii) a grant of the Korea Health Technology R&D Project through the Korea Health Industry Development Institute (KHIDI) the Ministry of Health & Welfare, Republic of Korea (Grant number: HI15C3224)

SNU Open Repository and Archive

StressGenePred: a twin prediction model architecture for classifying the stress types of samples and discovering stress-related genes in arabidopsis

Author: Ahn Hongryul
Hur Jihye
Jung Woosuk
Kang Dongwon
Kim Sun
Lee Chai-Jin
Lee Sangseon
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 22/12/2019
Field of study

Background Recently, a number of studies have been conducted to investigate how plants respond to stress at the cellular molecular level by measuring gene expression profiles over time. As a result, a set of time-series gene expression data for the stress response are available in databases. With the data, an integrated analysis of multiple stresses is possible, which identifies stress-responsive genes with higher specificity because considering multiple stress can capture the effect of interference between stresses. To analyze such data, a machine learning model needs to be built. Results In this study, we developed StressGenePred, a neural network-based machine learning method, to integrate time-series transcriptome data of multiple stress types. StressGenePred is designed to detect single stress-specific biomarker genes by using a simple feature embedding method, a twin neural network model, and Confident Multiple Choice Learning (CMCL) loss. The twin neural network model consists of a biomarker gene discovery and a stress type prediction model that share the same logical layer to reduce training complexity. The CMCL loss is used to make the twin model select biomarker genes that respond specifically to a single stress. In experiments using Arabidopsis gene expression data for four major environmental stresses, such as heat, cold, salt, and drought, StressGenePred classified the types of stress more accurately than the limma feature embedding method and the support vector machine and random forest classification methods. In addition, StressGenePred discovered known stress-related genes with higher specificity than the Fisher method. Conclusions StressGenePred is a machine learning method for identifying stress-related genes and predicting stress types for an integrated analysis of multiple stress time-series transcriptome data. This method can be used to other phenotype-gene associated studies.This work and publication costs were supported by National Research Foundation of Korea (NRF) funded by the Ministry of Science, ICT (No. NRF2017M3C4A7065887), and the Collaborative Genome Program for Fostering New Post-Genome Industry of the National Research Foundation (NRF) funded by the Ministry of Science and ICT (MSIT) (No. NRF-2014M3C9A3063541). This work was supported for W.J. by the Agenda program (No. PJ014307), Rural Development of Administration of Republic of Korea

SNU Open Repository and Archive

Subtype-specific CpG island shore methylation and mutation patterns in 30 breast cancer cell lines

Author: A Akalin
A Doi
A Lachmann
AG Rivenbark
AK Smith
B Langmead
B Schilling-Tóth
C Stirzaker
CF Mugal
CGA Network
D Sproul
F Brenet
F Krueger
GC Hon
GK Smyth
H Chae
H Li
H Li
Heejoon Chae
I Keshet
IR Watson
J An
J An
J Hornberger
J Xia
JK Rhee
JS Parker
JY Low
K Conway
K Holm
Kenneth P. Nephew
LA Bovolenta
LB Alexandrov
M Lienhard
M Szyf
NG Bediaga
OA Stefansson
P Gascard
RM Neve
RR Jadhav
S Goicoechea
S Kamalakaran
S Nik-Zainal
Sangseon Lee
SJ Schnitt
SM Lee
Sun Kim
T Fleischer
T Sorlie
TJ Chuang
V Matys
X Pan
X Rao
X Yang
Y Yang
YR Chin
Z Lasabova
Publication venue: 'Springer Science and Business Media LLC'
Publication date
Field of study

Crossref

PINTnet: construction of condition-specific pathway interaction network by computing shortest paths on weighted PPI

Author: A Franceschini
A Kawashima
A Subramanian
AE Karnoub
AL Tarca
AN Tegge
BN Sheikh
C Bole-Feysot
C Hsu
C Huang
CA Jamieson
CA Pataki
CR Velasco
D Guardavaccaro
D Nam
D Pan
D Sukhtankar
E Aksamitiene
E Glaab
E Potlukova
E Toubi
F Assche
F Rapaport
GB Jang
H Kim
H Shin
I Medina
I Rivals
J Lever
J Wang
J Zwerina
JA Parsons
JC Marioni
JD Ashwell
JH Nielsen
Ji Hwan Moon
JS Rawlings
JT Buijs
Kyuri Jo
L Vadlakonda
M Cordenonsi
M Donato
M Feldmann
M Francesconi
M Kanehisa
M Matsumoto
MD Deel
MJ Brooks
N Akeno
N Dey
N Itasaki
N Paulmann
NA Bosma
P Shannon
PC Hsu
RL Sorenson
RV Iozzo
S Lim
S Rieck
Sangseon Lee
Sangsoo Lim
Seokjun Seo
SH Wang
SJ Merrill
Sun Kim
TA Guise
V Bernard
X Guo
Y Chen
Y Huang
Y Li
Publication venue: 'Springer Science and Business Media LLC'
Publication date
Field of study

Crossref

Sparse Structure Learning via Graph Neural Networks for Inductive Document Classification

Author: Kim Sun
Lee Dohoon
Lee Sangseon
Piao Yinhua
Publication venue: Association for the Advancement of Artificial Intelligence
Publication date: 21/03/2022
Field of study

Recently, graph neural networks (GNNs) have been widely used for document classification. However, most existing methods are based on static word co-occurrence graphs without sentence-level information, which poses three challenges:(1) word ambiguity, (2) word synonymity, and (3) dynamic contextual dependency. To address these challenges, we propose a novel GNN-based sparse structure learning model for inductive document classification. Specifically, a document-level graph is initially generated by a disjoint union of sentence-level word co-occurrence graphs. Our model collects a set of trainable edges connecting disjoint words between sentences, and employs structure learning to sparsely select edges with dynamic contextual dependencies. Graphs with sparse structure can jointly exploit local and global contextual information in documents through GNNs. For inductive learning, the refined document graph is further fed into a general readout function for graph-level classification and optimization in an end-to-end manner. Extensive experiments on several real-world datasets demonstrate that the proposed model outperforms most state-of-the-art results, and reveal the necessity to learn sparse structures for each document

arXiv.org e-Print Archive

Association for the Advancement of Artificial Intelligence: AAAI Publications

SpliceHetero: An information theoretic approach for measuring spliceomic intratumor heterogeneity from bulk tumor RNA-seq.

Author: Minsu Kim
Sangseon Lee
Sangsoo Lim
Sun Kim
Publication venue: 'Public Library of Science (PLoS)'
Publication date: 01/01/2019
Field of study

MOTIVATION:Intratumor heterogeneity (ITH) represents the diversity of cell populations that make up cancer tissue. The level of ITH in a tumor is usually measured by a genomic variation profile, such as copy number variation and somatic mutation. However, a recent study has identified ITH at the transcriptome level and suggested that ITH at gene expression levels is useful for predicting prognosis. Measuring ITH levels at the spliceome level is a natural extension. There are serious technical challenges in measuring spliceomic ITH (sITH) from bulk tumor RNA sequencing (RNA-seq) due to the complex splicing patterns. RESULTS:We propose an information-theoretic method to measure the sITH of bulk tumors to overcome the above challenges. This method has been extensively tested in experiments using synthetic data, xenograft tumor data, and TCGA pan-cancer data. As a result, we showed that sITH is closely related to cancer progression and clonal heterogeneity, along with clinically significant features such as cancer stage, survival outcome and PAM50 subtype. As far as we know, it is the first study to define ITH at the spliceome level. This method can greatly improve the understanding of cancer spliceome and has great potential as a diagnostic and prognostic tool

Directory of Open Access Journals

Author Correction: Subnetwork representation learning for discovering network biomarkers in predicting lymph node metastasis in early oral cancer (Scientific Reports, (2021), 11, 1, (23992), 10.1038/s41598-021-03333-5)

Author: Kim Minsu
Kim Sun
Lee Doh Young
Lee Sangseon
Lim Sangsoo
Publication venue: Nature Publishing Group
Publication date: 01/01/2022
Field of study

© The Author(s) 2022.In the original version of this Article, Doh Young Lee was omitted as a corresponding author. Correspondence and requests for materials should also be addressed to [email protected]

SNU Open Repository and Archive

PubMed Central

A probabilistic model for pathway-guided gene set selection

Author: Kim Inyoung
Kim Sun
Kim Youngkuk
Lee Sangseon
Namkoong Hugh
Publication venue: Institute of Electrical and Electronics Engineers Inc.
Publication date: 01/01/2021
Field of study

© 2021 IEEE.Breast cancer is classified into five intrinsic subtypes, with differing treatment methods and prognoses. Therefore, accurate identification of subtypes from patient transcriptome data is essential. Many gene signatures, including PAM50, have been developed to classify breast cancer subtypes. However, existing gene selection methods do not utilize biological pathways. Gene signature selection using biological pathways can explain signature genes in terms of biological functions. Thus, we propose a probabilistic model for pathway-guided gene set selection using gene expression data. First, we defined gene and pathway factors based on gene expression and pathway activation levels, and calculated the posterior probability. Second, we adopted the prediction strength to guide gene set selection. Third, the gene set was selected using the posterior probability and prediction strength values. Finally, on evaluating the selected gene set, it was experimentally confirmed that our gene set performed better on classification tasks than the PAM50 gene set, a gene set produced by the XGBoost classifier, and a random gene set. Among the genes selected by our method, it was confirmed that the genes included in the cell cycle and circadian rhythm pathways showed different expression patterns for each breast cancer subtype. Our selected gene set exhibited biological significance in terms of pathway activation.N

SNU Open Repository and Archive

Risk Stratification for Breast Cancer Patient by Simultaneous Learning of Molecular Subtype and Survival Outcome Using Genetic Algorithm-Based Gene Set Selection

Author: Kim Sun
Koo Bonil
Lee Dohoon
Lee Sangseon
Lee Sunho
Sung Inyoung
Publication venue: Multidisciplinary Digital Publishing Institute (MDPI)
Publication date: 01/08/2022
Field of study

Simple Summary Patient stratification is clinically important because it allows us to understand the characteristics and establish treatment strategies for a group. Transcriptomic data play an important role in determining molecular subtypes and predicting survival. In the case of breast cancer, although the order of prognosis according to molecular subtypes is well known, there is heterogeneity even within a subtype. Therefore, patient stratification considering both molecular subtypes and survival outcomes is required. In this study, a methodology to handle this problem is presented. A genetic algorithm is used to select a set of genes, and a risk score is assigned to each patient using their expression level. According to the risk score, patients are ordered and stratified considering molecular subtypes and survival outcomes. Consequently, informative genes for patient stratification with respect to both aspects could be nominated, and the usefulness of the risk score was shown through comparison with other indicators. Patient stratification is a clinically important task because it allows us to establish and develop efficient treatment strategies for particular groups of patients. Molecular subtypes have been successfully defined using transcriptomic profiles, and they are used effectively in clinical practice, e.g., PAM50 subtypes of breast cancer. Survival prediction contributed to understanding diseases and also identifying genes related to prognosis. It is desirable to stratify patients considering these two aspects simultaneously. However, there are no methods for patient stratification that consider molecular subtypes and survival outcomes at once. Here, we propose a methodology to deal with the problem. A genetic algorithm is used to select a gene set from transcriptome data, and their expression quantities are utilized to assign a risk score to each patient. The patients are ordered and stratified according to the score. A gene set was selected by our method on a breast cancer cohort (TCGA-BRCA), and we examined its clinical utility using an independent cohort (SCAN-B). In this experiment, our method was successful in stratifying patients with respect to both molecular subtype and survival outcome. We demonstrated that the orders of patients were consistent across repeated experiments, and prognostic genes were successfully nominated. Additionally, it was observed that the risk score can be used to evaluate the molecular aggressiveness of individual patients.N

SNU Open Repository and Archive

Directory of Open Access Journals

PubMed Central